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ABSTRACT 



The American College Testing Program offers research services through which colleges can predict 
tile freshman grades of their future students. This paper describes research done to establish a 
minimum sample size requirement for calculating prediction equations for college freshman grade 
average. Results from all the studies* suggest that eight-variable prediction equations based on 
representative samples of size 50 would have almost Jhe s^me accuracy as preelection equations 
based on larger samples. ^ 




DETERMINING MINIMUM SAMPLE SIZES FOR MULTIPLE 
REGRESSION GRADE PREDICTION EQUATIONS FOR COLLEGES 



Richakl Sawyer 



The American College Testing Program tACT) offers 
research services through which colleges can predict 
the freshman grades of their futurq students (The 
American College Testing Program, 1983) The stu- 
dents' predicted grades ara based on their ACT test 
* scores in English, mathematics, social studies, and 
* natutal sciences, and on -their self-reported high schOQl 
grades in these four subject areas. The predicted 
grades are calculated by weighting the test scores and 
high school grades in least-squares regression equa- 
tions that are specific to each college. 1 



Tho weights in 3 college's prediction equation are 
usually calculated from data on all students in a 
previous freshman class who took the ACT. Because 
these weights are estimates wfiose accuracy depends 
on the stie of the base sample used to calculate them, 
and because error in estimating the weights propagates 
error in prediction, the freshman class size affects x 
prediction error It is possible, therefore, ttiat weights 
calculated from very small freshman classes could be^ 
subject to large s'ampling errors, resulting in predic- 
' lions of unacceptable accuracy. 



One way to mitigate the effect of small sample sizes on 
prediction accuracy is to use information collaterally 
from several colleges in constructing prediction equa- 
tions Novick et al (1972) further developed a Bayesian. 
apdel due to Lmdley (1970) in wljtich this method was 
ifed Novick^t al calculated for m = 22 junior colleges 
le standard least-squares and the Bayesian "m-group" 
prediction equations for freshman grade average, using 
the four ACT* test scores as predictors. The mean 
number of students in the 22 colleges was approx- 
imately 246 Novick et al. then cross-validated the 
prediction equations against the following year's fresh- 
men at these colleges. They obtained an average 
cross-validated Mean Absolute Error (MAE) of .£8 
grade units for both the least-squares and the Bayesian 
m-group prediction methods When tWB prediction 
equations were developed from 25% samples of the 
base year freshman classes, the resulting mean cross- 
validated MAE was- 61 grade units for the least- 
squares and .59 grade units for the Bayesian method. 
The results of Novick et al. suggest, therefore, that, 
four-variable^'least-squares predictions for freshman 
classes with as few as 50 students would not be 
grossly inaccurate. The results further suggest that the 
Bayesian m-<ftoup m^Jiod would yield more accurate 



predictions than least-squares when sample sizes are 
smarter than 50. Other centralized prediction methods, 
such as that due to Dempster, Rubin, and Tsutakawa 
(1901), als.o seem promising in this regard 

' The focus of thi£ paper is on standaid least-squares^ 
predictions, since they are still the most extensively 
used predictions and are currently used by ACT. The 
purpose of thfc study is to determinq for how small a 
college least-squares prediction equations can., be 
developed Without significant degradation in prediction 
accuracy. We shall consider of practical significance a 
10% or larger increase in MAE over that which would' 
occur at larger colleges. 

One way to add/ess this issue is to assume that the 
freshmen in a college are a random sample from a 
' hypothetical population with postulated statistical char- 
acteristics. Under -this assumption, determining tho 
appropriate sample size for calculating prediction 
weights becomes a mathematical problem of relating 
measures of prediction accuracy to parameters of a 
statistical model. Sawyer (1982) took this approach; 
some of the results from that study are discussed later. 

Students from colleges of different sizes may be 
samples from different populations of student^, insofar 
as the predictability of their grades is concerned. Thus, 
a college's sizet as an institutional characteristic that 
Attracts certain Jcmds of students, could be related to 
the predictive validity of ACT test scores and hi§f\ 
school grades. It is conceivable, for example, that the 
grades of students enrolled in very small colleges 
could be predicted mo re accurately than those of 
students enrolled in larger colleges. Sawyer and Maxey 
(1982) studied'the sample size problem in this context; 
they found little relationship between prediction accur- 
acy and cotlege size for colleges with 90 or more 
freshmen. They also hypothesized that predictions of 
acceptable accuracy could be made for entire fresh- 
man classes with as few as 50 students. 



■In practice, ACT averages the predictions from two four-variable 
multiple 'regression equation^pased on test scores separately and 
on high school grades separately The accuracy of these 'Predic- 
tions, though, is virtually the same as that Of predictions based on a 
single eight variable multiple regression equation (Sawyer and 
Maxey. 1979} 



As a result of tf^se two studies, ACT lowered the 
minimum sample size requirement for its predictive 
research sondes from 100 to 75 students, effective.for 
1979-80 freshmen. In this paperMhe accuracy of the 
grade predictions at colleges with 75-100 freshmen is 



summarized. The experience in predicting grades at 
these colleges is thfcn discussed in the context of the 
previously cited studies. Finally, conclusions are drawn 
about the accuracy of predictions at colleges with 
fewer than 75 students. 



\ 



Theoretical Considerations 



Suppose the regression coefficients in a prediction 
equation are-estimated from a random sample (y )( x^), 
(t " 1 , ,n), where y^ is the dependent variable and Xj 
is^a vector ofp predictor variables for the i-th case. (In 
the application described above, y f is the college- 
freshman grade average and p - 8.) Suppose X| has a 
multivariate normal distribution with mean fx and 
covanance rpatrix £. Therefore, the predictors x are 
assumed to be random rather than fixed; this aspect of 
the model reflects the inability of colleges to control 
precisely the test scores and high school grades of 
their entering freshmen. 

The conditipnal distribution of given Xj is assumed to 
be normal with .mean (1.Xj) /8 and variance o. The 
regression coefficients are estimated by the usual 
least-squares estimates 



j? - (X'X) 1 X'y, where X 



and y' - (y, y n ). 

~ftn additional independent observation (y*,x*') is to be 
taken and y* is to be predicted by y = (1,x*') 0 

Sawyer (1983) studied the moments of the distribution 
of the prediction error y - y\ The mean of y - y* is, of 
course, 0; its standard deviation is 



RMSE - a-K(n.p). 
where K(n,p) - / J5LlA><ii 

V n(n - p - 2Y 



Sawyer found that when K < 1,10. the distribution of 
y - y* is approximately normal. Inlhis case, the mean 
absolute error of prediction rvJAE r E(|y - y*|) is 
approximately K 

MAE - v^/w/RMSE. 

The function K(n,p) is an inflation factor due to estimat- 
ing the regression coefficients, as nr °°, K(n,p) — 1. 
For fixed values of K and p one can approximate the 
corresponding required 'base sample s\zS n by 



(1) 



2K -1 
K'-1 



K'-1 



The coefficients in (1) are displayed in Table 1 for 
sweral values of K and p. They suggest that in 
predicting college freshmen grade average ^from an 
eight-variable multiple regression equation, a base 
sample size tpf approximately 53 would result in a 10% 
inflation in RMSE or MAE over that which would result 
"if the population values of the coefficients were known. 



TABLE 1 



Approximate Relationship between Number of Predictors 
and Sample Size Required for Varying Degrees of Prediction Accuracy 



Inflation Factor (K) 



Approximate required sample size 6 



1.01 , 

1.05 

1.10 

1.25 

1.50 



n = 50.8p + 51.8 
n = 10.8p + 11.8 
n - 5.8p + 6.8 
n = 2,8p + 3.8 
n^ 1.8p + 2.8 



'Approximate base sample size (n) needed to achieve a MAE - Ko \flfn with 1 < p < 20 predictors. 



Empirical Research 



Sawyer and Maxey (1982) examined the accuracy of 
piediclion equations at a random sample of 205 col- 
leges that paiticipated in the ACT Research Services 
10 1974-70 and in 1976 77. ,A Separate prediction 
equation for each college was calculate^ from its 
1974-75 data Then, each resulting prediction equation 
was applied to clala'for the 1976-77 freshmen, and the 
predicted and actual grade averages taere compared. 
(Tho two-yt.ar. lag between ,baso year and cross- 
validation vear reflects the time lag encountered by 
colleges in developing and using prediction equations.) 

The cross-validation statistics 'in Table 2 are sum- 
marized for five categories of colleges defined by their 
base sample size. The statistics P20, P5CC and PI 00 
refer to the proportion of students in a college whose 

1 



predicted grade averages were vfvithin .20, .50, or 1.00 
grade unit's, respectively, of their actual grade averages. 
The statistic CVR is the correlation between earned 
and predicted grade average in a college. The numbers 
in Table 2 are .mean values of these cross-validation 
statistics among colleges in the sample. 



Table 2 indicates that the predictive validity of ACT 
test scores and high school grades is only weakly 
related to freshman class size at colleges with 90 or 
mare freshmen For example, the average observed 
My°tE ranged from .51 to .54 grade units over the five 
size categories. Similarly, the average cross-validated 
correlation ranged from .53 to 56 over the five size 
categories 



TABLE 2 



Mean Cross-Validation Statistics, by Siie of College Freshman Class 







(Total Group Equation) 










Size 


Number of 


Number oi. 




Cross-validation statistic 




category 


colleges 


students (1976) 


MAE 


P20 


P50 


R100 


CVR 


90 100 


15 


. * 2,544 . 


52 


.25 


.57 


.87 


.53 


101-200 


76 


1.1.007 


.51 


26 


.59 


. 89 


.55 


201-500 


50 


15,951 


.54 


.24 


.56 


" .87 


.56 


501-1000 


35 


29,603 < 


.54 


.24 


.56 


87 


.55 


1001 » 


29 


55.773 


.53 


.25 


.57 


- .87 


.56 


All colleges 


205* 


114.878 


.53 %, 


.25 


.57 


.88 


.55 



t 

Because of ACT's sample size requirements in effect 
at the time of the Sawyer and Maxey study, there were 
no colleges with total group sample sizes below 90. T<j) 
obtain evidence about prediction accuracy for sample 
sizes below 90, albeit indirect, Sawyer and Maxey 
developed prediction equations from random sub- 
samples of the 1974-75 freshman data from each 
college. The results, shown in Table 3, mdicate that the 



MAEs associated with grade predictions ,b&sed 'on 
random subsamples of size 50 are within 10% of the 
MAEs associated with predictions based on all records. 
Therefore, although direct evidence of the accuracy of 
grade predictions for colleges with fewer than 90 
students was not available, it appeared that grade 
predictions of comparable accuracy could be made at 
colleges with as few as 50 freshmen. 



TABLE 3 



Mean College Cross-Valldatlon Statistics for Prediction 
Equations Derived from Subsamples of^ase Year Data 



Size of subsample 
of base year data 

25 
50 
75 
100 

Alt records 



Cross-validation statistics 



MAE 

.65 
5\ 

.55 
.54 
.53 



P20 

'.21 
.23' 
.24 
.24 
.25 



P50 

.48 
.54 
,55 
.56 
.57 



1 



.79 
.85 
.87 
.88 
.88 



CVR 

41 

.49 
.52 
.53 
.55 



Follow-up Study 



J 

As a result of the two studies above, ACT lowered the 
minimum sample. size requirement for its predictive 
research services from 100 to 75 students, effective fo|L 
1979-80 freshmen. Following is an examination of the 
accuracy of the grade predictions at the colleges 
whose sizes are in this range. Further evidence is also 
presented on the likely prediction accuracy at colleges 
with fewer than 75 students. 



Prediction equations for freshman grade average were 
developed from the 1979-80 freshman grade data at all 
colleges with between 70 and 100 freshmen. (To- 
accommodate small colleges with a few unexpectedly 
invalid records, ACT used an actual cut-off of five 
records less than the published cut-off of 75.) Separate 
, subgroup equations were also developed for the males 
and females at each college. The prediction equations 
were* then cross-validated against the grades of the 
1981-82 freshmen at each college. 



The results for the total group prediction equations, 
contained in, Table 4a, confirrxKthe expectation that 
predictions based on as few as 75, students would be 
about as accurate as predictions based on larger 
numbers of students. The raeetn MAE for colleges with 
70-79 freshmen, for example, was .51 gradejjnits; the 
same mean MAE was observed for colleges with 90- 
100 freshmen. In the Sawyer and Maxey study cited 
above, the mean MAE for colleges with 90-100 fresh- 
men was .52 grade units, and the mean MAE for all 
colleges was .53 grade units. 

It is interesting to note in Table 4a that the mean MAE 
for colleges with 80-89 freshmen (.55 grade units) 'is 
actually larger than the mean MAE for colleges with 
70-79 freshmen (.51 grade units). This result might 
reflect differences in the predictive validity of the ACT 
at colleges in thd£e two sjze categories. Given the 
estimated standard errors for these means, however, 
the differences could also be reasonably thought of as 
due to chance. ■ „ 




TABLE 4a 

Mean Cross-Validation Statistics, by Size oi College Freshman Class 

(Total Group Equation) ^ i 

j n - _ > _ 

Size Number of Number of Mean cro ss-validation statistics* 



category 


colleges ^ 


students (1981) 


MAE 


P20 


P50 


P100 


CVR 


.70-79 _ 


33 


2,643, 


.51(.01) 


.24(.01) 


.58(.01) "i 


.89(.01) 


.52(02) 


80-89" 


$o 


2,000 


.55(.02) 


.25(.02) 


56(.02) 


.85(02) - 


. .46(03) 


90-100 




849 


.51 (.03) 


.28(.02) 


.60(02) 


88(.03) 


.51 (.03) 


All colleges 


68 


5,492 


.53(.01) 


.25(.01) 


:58(.01) 


.88(01) 


.49(.02) 



a Numbers in parentheses are estimated standard errors corresponding to the estimated means. 



The results for the separate subgroup equations for 
males are contained rn Tablojb The mean MAE was 
63 giade umls for predictions chased on 30-39 males, 
and 58 grade units for predictions based on 40-49 
males. In the* Sawyer and Maxey (1982) study, the 
mean MAE over all colleges was also .58 grade uriits. 
Therefore, it would appear that prediction equations 
based on as lew as 40-49 males are about as accurate 
as precisions based on larger numbers of males. 

The results for the separate subgroup equations for 
females are contained in Table 4c. The mearr MAE was 
47 grade units^or predictions based on 50-59 females" 
In the Sawyer and Maxey (1982) study, the mean MAE 



over all colleges Was .52 grade units Therefore, predic- 
tion equations based on as few as 50-59 females are 
about as accurate as predictions based on larger 
numbers of females 



Because of the minimum sample size requirement now 
in effect, criss-validation statistics are not reported in 
Table 4a for colleges with fewer than 70 freshmen. The 
results in Tables 4b and 4c for the separate subgroup 
equations suggest, however, that total group equations 
developed from samples of as few as 40-50 freshmen 
would have nearly the same prediction accuracy as 
total group equations developed from larcj£r samples. 



TABLE 4b 



Size 

category 



Mean Cross-Validation Statistics, by Number of Mates in Freshman Class 

(Separate Subgroup Equation for Wales) 



25-29 
30 39 
40-49 

50 and above 
All colleges 



Number of 
colleges 

6 
8 
6 
1 

21 



Number of 
students (1981) 



208 
293 
220 
75 

796 



Mean cross-validation statistics 8 1 


MAE 


P20 


P50 


P100 


CVR 


74(04) 


13( 03) 


.35(04) 


.71(03) 


47(.03) 


.63(04) 


21 (.03> 


.52( 03) 


.81(03) 


36(.07) 


.58(03) 


22(.(jQ) 


.52(02) 


.84(03) 


"39(08) 


41(-) 


. 36( ) 


•71( -) 


93( - ) 


.54(~) - 


.64(03) 


.20(02) 


48( 03) 


.80(02) 


.41 (.04) 



r 



a Numbers in parentheses are estimated standard errors corresponding lo the estimated means. 



TABLE 4c 



Mean Cross-Validation Statistics, by Number of Females In Freshman Class 

(Separate ^ubgroup Equation for Females) 



Size • 
category 


Number of 
colleges ■ 


Number of 




Mean cros 


"B-validation statistics 8 




students (1981) 


MAE 


P20 


P50 


P100* 


CVR 


25-29 


5 


147 


.58(12) 


.27 (-.05) 


.51 (.08) 


84(.08) 


.47(_09) 


30-39 


8 


285 


55(.03) 


.25(.02) 


.56(.03) 


,86(.03) 


38(.04) , 


40-49 


20 


858 


.56(03) 


.25(02) 


56(.03) 


..86(.02fj 


39(.04) - 


50-59 


12 


' 530' 


.47(03) 


.271.02) 


63(02) 


'.91 (.03) 


.57( 03) ' 


60 and above h 


3 


133 


.51 (.06) 


.24(.02) 


.63(05) 


88(.03) ' 


.51 (09) 


All cblleges 


48 


> 1,953 ' 


,54(,02) ^ 


.26(.01) 


.58(02) 


.87(.01) 


.45(.03) 



^Numbers in parentheses are estimated standard errors corresponding to the estimated means, 
h Maximum sample size was 72.^ \ 
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Conclusions 



/ 



Results from studies by Noyick et al. (1972), Sawyer 
and Maxey (1982), and Sawyer (1982) suggest the 
likelihood that least-squares grade prediction' equa- 
tions based on data for as few as 50 students would be 
about as accurate as prediction equations based on 
much larger samples. The present study confirms that 
total group predictions based on 70 or more students 
have the same accuracy as predictions based on large., 
samples Moreover, the results from separate-sex 
prediction equations lend further support to the idea 



that a base sample size* as low as 50 would be 
satisfactory 

One should keep in mind that these sample? size 

recommendations pertain to entire freshman classes 

or to representative samples of freshman classes. 

Prediction equations based on greatly nonrepresenta- 

tive samples may result in larger prediction errors 

when applied to more general student populations. 
▼ 
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